199 research outputs found
A Spectrally Weighted Mixture of Least Square Error and Wasserstein Discriminator Loss for Generative SPSS
Generative networks can create an artificial spectrum based
on its conditional distribution estimate instead of predicting only the mean value, as the Least Square (LS) solution
does. This is promising since the LS predictor is known to
oversmooth features leading to muffling effects. However,
modeling a whole distribution instead of a single mean value
requires more data and thus also more computational resources. With only one hour of recording, as often used with
LS approaches, the resulting spectrum is noisy and sounds
full of artifacts. In this paper, we suggest a new loss function,
by mixing the LS error and the loss of a discriminator trained
with Wasserstein GAN, while weighting this mixture differently through the frequency domain. Using listening tests,
we show that, using this mixed loss, the generated spectrum
is smooth enough to obtain a decent perceived quality. While
making our source code available online, we also hope to
make generative networks more accessible with lower the
necessary resources
Sentiment Perception Adversarial Attacks on Neural Machine Translation Systems
With the advent of deep learning methods, Neural Machine Translation (NMT)
systems have become increasingly powerful. However, deep learning based systems
are susceptible to adversarial attacks, where imperceptible changes to the
input can cause undesirable changes at the output of the system. To date there
has been little work investigating adversarial attacks on sequence-to-sequence
systems, such as NMT models. Previous work in NMT has examined attacks with the
aim of introducing target phrases in the output sequence. In this work,
adversarial attacks for NMT systems are explored from an output perception
perspective. Thus the aim of an attack is to change the perception of the
output sequence, without altering the perception of the input sequence. For
example, an adversary may distort the sentiment of translated reviews to have
an exaggerated positive sentiment. In practice it is challenging to run
extensive human perception experiments, so a proxy deep-learning classifier
applied to the NMT output is used to measure perception changes. Experiments
demonstrate that the sentiment perception of NMT systems' output sequences can
be changed significantly
Reverse KL-Divergence Training of Prior Networks: Improved Uncertainty and Adversarial Robustness
Ensemble approaches for uncertainty estimation have recently been applied tothe tasks of misclassification detection, out-of-distribution input detection andadversarial attack detection. Prior Networks have been proposed as an approachto efficientlyemulatean ensemble of models for classification by parameteris-ing a Dirichlet prior distribution over output distributions. These models havebeen shown to outperform alternative ensemble approaches, such as Monte-CarloDropout, on the task of out-of-distribution input detection. However, scalingPrior Networks to complex datasets with many classes is difficult using the train-ing criteria originally proposed. This paper makes two contributions. First, weshow that the appropriate training criterion for Prior Networks is thereverseKL-divergence between Dirichlet distributions. This addresses issues in the nature ofthe training data target distributions, enabling prior networks to be successfullytrained on classification tasks with arbitrarily many classes, as well as improvingout-of-distribution detection performance. Second, taking advantage of this newtraining criterion, this paper investigates using Prior Networks to detect adversarialattacks and proposes a generalized form of adversarial training. It is shown that theconstruction of successfuladaptivewhitebox attacks, which affect the predictionand evade detection, against Prior Networks trained on CIFAR-10 and CIFAR-100using the proposed approach requires a greater amount of computational effort thanagainst networks defended using standard adversarial training or MC-dropoEPSRC
Cambridge Assessmen
Deliberation Networks and How to Train Them
Deliberation networks are a family of sequence-to-sequence models, which have
achieved state-of-the-art performance in a wide range of tasks such as machine
translation and speech synthesis. A deliberation network consists of multiple
standard sequence-to-sequence models, each one conditioned on the initial input
and the output of the previous model. During training, there are several key
questions: whether to apply Monte Carlo approximation to the gradients or the
loss, whether to train the standard models jointly or separately, whether to
run an intermediate model in teacher forcing or free running mode, whether to
apply task-specific techniques. Previous work on deliberation networks
typically explores one or two training options for a specific task. This work
introduces a unifying framework, covering various training options, and
addresses the above questions. In general, it is simpler to approximate the
gradients. When parallel training is essential, separate training should be
adopted. Regardless of the task, the intermediate model should be in free
running mode. For tasks where the output is continuous, a guided attention loss
can be used to prevent degradation into a standard model.Comment: 10 pages, 2 figure
Parallel Attention Forcing for Machine Translation
Attention-based autoregressive models have achieved state-of-the-art
performance in various sequence-to-sequence tasks, including Text-To-Speech
(TTS) and Neural Machine Translation (NMT), but can be difficult to train. The
standard training approach, teacher forcing, guides a model with the reference
back-history. During inference, the generated back-history must be used. This
mismatch limits the evaluation performance. Attention forcing has been
introduced to address the mismatch, guiding the model with the generated
back-history and reference attention. While successful in tasks with continuous
outputs like TTS, attention forcing faces additional challenges in tasks with
discrete outputs like NMT. This paper introduces the two extensions of
attention forcing to tackle these challenges. (1) Scheduled attention forcing
automatically turns attention forcing on and off, which is essential for tasks
with discrete outputs. (2) Parallel attention forcing makes training parallel,
and is applicable to Transformer-based models. The experiments show that the
proposed approaches improve the performance of models based on RNNs and
Transformers.Comment: 13 pages, 8 figures. arXiv admin note: text overlap with
arXiv:2104.0126
Minimum Bayes' Risk Decoding for System Combination of Grammatical Error Correction Systems
For sequence-to-sequence tasks it is challenging to combine individual system
outputs. Further, there is also often a mismatch between the decoding criterion
and the one used for assessment. Minimum Bayes' Risk (MBR) decoding can be used
to combine system outputs in a manner that encourages better alignment with the
final assessment criterion. This paper examines MBR decoding for Grammatical
Error Correction (GEC) systems, where performance is usually evaluated in terms
of edits and an associated F-score. Hence, we propose a novel MBR loss function
directly linked to this form of criterion. Furthermore, an approach to expand
the possible set of candidate sentences is described. This builds on a current
max-voting combination scheme, as well as individual edit-level selection.
Experiments on three popular GEC datasets and with state-of-the-art GEC systems
demonstrate the efficacy of the proposed MBR approach. Additionally, the paper
highlights how varying reward metrics within the MBR decoding framework can
provide control over precision, recall, and the F-score in combined GEC
systems
Identifying Adversarially Attackable and Robust Samples
Adversarial attacks insert small, imperceptible perturbations to input
samples that cause large, undesired changes to the output of deep learning
models. Despite extensive research on generating adversarial attacks and
building defense systems, there has been limited research on understanding
adversarial attacks from an input-data perspective. This work introduces the
notion of sample attackability, where we aim to identify samples that are most
susceptible to adversarial attacks (attackable samples) and conversely also
identify the least susceptible samples (robust samples). We propose a
deep-learning-based method to detect the adversarially attackable and robust
samples in an unseen dataset for an unseen target model. Experiments on
standard image classification datasets enables us to assess the portability of
the deep attackability detector across a range of architectures. We find that
the deep attackability detector performs better than simple model
uncertainty-based measures for identifying the attackable/robust samples. This
suggests that uncertainty is an inadequate proxy for measuring sample distance
to a decision boundary. In addition to better understanding adversarial attack
theory, it is found that the ability to identify the adversarially attackable
and robust samples has implications for improving the efficiency of
sample-selection tasks, e.g. active learning in augmentation for adversarial
training
Predictive Uncertainty Estimation via Prior Networks
Estimating how uncertain an AI system is in its predictions is important to improve the safety of such systems. Uncertainty in predictive can result from uncertainty in model parameters, irreducible data uncertainty and uncertainty due to distributional mismatch between the test and training data distributions. Different actions might be taken depending on the source of the uncertainty so it is important to be able to distinguish between them. Recently, baseline tasks and metrics have been defined and several practical methods to estimate uncertainty developed. These methods, however, attempt to model uncertainty due to distributional mismatch either implicitly through model uncertainty or as data uncertainty. This work proposes a new framework for modeling predictive uncertainty called Prior Networks (PNs) which explicitly models distributional uncertainty. PNs do this by parameterizing a prior distribution over predictive distributions. This work focuses on uncertainty for classification and evaluates PNs on the tasks of identifying out-of-distribution (OOD) samples and detecting misclassification on the MNIST dataset, where they are found to outperform previous methods. Experiments on synthetic and MNIST data show that unlike previous non-Bayesian methods PNs are able to distinguish between data and distributional uncertainty
- …